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ABSTRACT 

The effectiveness of a data retrieving method depends upon the data specific queries for retrieving the data from 
the database. Fundamentally, iceberg queries are unique class of aggregation queries that compute aggregate 
values upon user interested threshold. The basic bitmap index technique is used to process an iceberg query by 
conducting a bitwise-AND operation between every pair of bitmap vectors that consumes a large amount of 
execution time to answer an iceberg query. Furthermore, this execution time increases when the cardinality of 
an attributes are increases. The major part taken into the consideration about the bitwise-AND operations in 
the iceberg queries. The reduced number of AND operation increases the effectiveness of tlte iceberg query. In 
this work, an efficient iceberg query evaluation process is proposed by removing the unnecessary fruitless 
bitwise-AND operations needed to find the item pairs. The proposed scheme is index-based technique. In this 
scheme, the index positions of bitmaps vectors whose 1 's are retrieved. Then retrieved index positions will be 
further processed to turn out in to iceberg result upon user provided threshold. The extensive experimental 
results on the real and synthetic data sets are showing better execution time of an iceberg query evaluation. 
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I. INTRODUCTION 

Discovery of knowledge or summarized information from operational databases which is frequently 
required by the top officials and/or executives to make better decisions in modern business organizations. Then 
an aggregation is one kind of knowledge representation which is computed by processing of one or more 
selected attributes of the database table and is useful to perform business analysis by computer analyst. 

Iceberg queries were first studied in database and data mining field by scientist named M. Fang et.al 
[12]. According to him, IBQ is defined as this is a unique class of an aggregation query which computes 
aggregated values upon user provided thresholds. Syntax of an IBQ on a relational table R (Ci, C2. . . C„) is stated 
below: 

SELECT Ci, Cj... C m , AGG (*) FROM R GROUP BY C„ Cj... C m HAVING AGG (*) > = T 

Where Cj, Cj... C m represents a subset of attributes in R and referred to as aggregate attributes. AGG 
represents an aggregation function such as COUNT, SUM, MIN and MAX. The greater than or equal to (>=) is a 
special symbol used as a comparison predicate. 

In this paper we focus on iceberg query with aggregation function COUNT having the anti-monotone 
property. For example, if the count of a group is below T the count of any super group must be below T. Iceberg 
queries are today being processed with techniques that do not scale well to large data sets. Hence, it is necessary 
to develop efficient techniques to process them easily. One simple technique to answer an iceberg query is by 
first aggregating all tuples using GROUP BY clause and then evaluating the HAVING clause to select the 
qualified tuples among them. However, this is difficult because database table is several times larger than main 
memory. In another technique, the records of the database table were sorted on the disk and then passed the 
sorted records into the main memory to form an aggregation. Further, it selects values of an aggregation which 
are greater than a specified threshold. If the available memory is less than the table size, then the data is to be 
passed over in more number of times from the disk. 
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Therefore, query evaluation consumes long execution time. It is observed that, above techniques 
evaluate iceberg queries by offering a long CPU time and sufficient main memory, but in practice these two are 
costlier resources in the computer systems. 

Bin He et al. [1], evaluated iceberg query quickly by minimizing the CPU and Memory usage statistics 
using compressed bitmap index. He indexed all the bitmap vectors of attributes in the selection. A bitmap for an 
attribute in a table can be viewed as a matrix having r rows consisting corresponding number of tuples and 
columns indicating the number of distinct values of an attribute. If there is a bitmap vector in the k' h position of 
the attribute then the element in the matrix is 1 else 0. Then the original bitmap vectors were aligned with 
available free space in the memory using word aligned hybrid compression technique. Bin He et.al [1] used 
priority queues to efficiently evaluate the iceberg queries. The bitmap vectors were placed in the said priority 
queues an on order of their first 1 bit position computed by function Firstlbitposition. Then bitwise- AND 
operation was conducted on the vectors pair that were identified by other function called 
FindNextAlignedVector. In order to select the next aligned vectors pair, the push and pop operations performed 
repeatedly until either of the PQs becomes empty. 

In this, we consider iceberg query with count function and allows anti-monotone property which is 
stated as "if count of any group is below threshold then its super group must be below threshold ".The iceberg 
query is evaluated by conducting a bitwise- AND operation between every pair of bitmap vector. If the resultant 
vector of that pair have enough l's count upon user provided threshold then declared as iceberg result, if not, not 
an iceberg result. Else this is also verified for empty or zero vectors. That means the vector which does not 
contain any l's in it. Thus the AND operation between them is wasted except in first case which is being an 
iceberg result. Further, this process was also complicated to increased number of bitmap pairs that are increased 
by the cardinality of attributes are increased. 

However, the empty bitwise-AND result problem was completely solved recently by an author Bin He 
et.al in [1] by developing an efficient vector alignment algorithm. The vector alignment algorithm ensures that no 
empty bitwise-AND results will be generated before conducting any AND operation. In this efficient work, the 
IBQ execution time is improved greatly by eliminating AND operation between pair of bitmap vector whose 
resultant is empty or zero. But in this case, the query evaluation time increased is for every bitwise-AND 
operation on 2 vectors takes place, for that we need to scan each pair when there is a vector alignment of 2 
vectors is found otherwise the search time will be major factor in finding the item pair of iceberg query. 

For Example: If Al is a vector of an attribute A and Bl, B2 and B3 are vectors of B attribute then we 
need to find the alignments for Al with Bl, Al with B2 and Al with B3. In this case Al should scan for 3 times 
with B vector to find vector alignment. Thus causes wasting of search time for finding the same for vector 
alignment with B then performing AND operation on them. 

In this paper, a new scheme of evaluation of iceberg query is proposed to avoid AND operations 
between pair of bitmap vectors. The proposed system improves further simplification of evaluation process using 
index-based technique. In this scheme, the index positions of all bitmap vectors of each attribute, whose index 
positions l's are retrieved and that are further processed to turn out into iceberg results upon user provided 
threshold. With this approach, we do not perform bitwise-AND operation and we can declare an iceberg item 
pairs without performing any bitwise-AND operation. The index-based accessing of the positions of bitmap 
vectors will increases the execution time. 

II. LITERATURE SURVEY 

In recent times, the evaluation of iceberg queries has attracted researchers significantly due to the 
demand of scalability and efficiency. The research work is reviewed in two subsections. In the first subsection i.e. 
2.1, we provide a review of related work on the bitmap index technology used by different authors. We also 
review the related research work of an iceberg query evaluation using bitmap indices in second subsection i.e. 2.2 
which is the focus of this paper for optimization of iceberg queries. 

2.1 Bitmap index 

The concept of bitmap index was first introduced by professor Israel Spiegler et al [19]. Bitmap indices 
are known to be efficient in order to accelerate the iceberg queries especially used in the data warehousing 
applications and in column stores. In data warehouse applications, bitmap indices are shown to perform better 
than tree based index scheme, such as the variants of B-tree or R-tree [13], [15], [17]. 

Compressed bitmap indices are widely used in column oriented data bases, such as C -store [14] to improve the 
performance over row oriented data bases. 
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Table 1: An Example of Bitmap Index 
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(a) Table R 



(b) Bitmap Indices for A & B 



2.2 Iceberg query evaluation 

Processing of iceberg query was first studied by Fang et.al [12] by extending the probabilistic techniques 
[11] and suggested hybrid and multi buckets algorithms. The sampling and multiple hash function techniques 
were used as basic building blocks of probabilistic techniques such as scaled-sampling and coarse-count 
algorithms. 

They estimated the sizes of query results in order to predict the valid iceberg results. This improves 
query performance and reduces memory requirements greatly. However, these techniques erroneously resulted in 
false positives and false negatives. To recover from these errors, efficient sttategies are designed by hybridizing 
the sampling and coarse-count techniques. To optimize the query execution time of hybrid strategies by extending 
the linear counting probabilistic algorithm for counting the number of unique values in the presence of duplicates. 
The linear counting algorithm is based on hashing technique allocates a bitmap (hash table) of size m in main 
memory. All entries are initialized to "0"s. The algorithm then scans the relation and applies a hash function to 
each data value in the column of interest. The hash function generates a bitmap address and the algorithm sets this 
addressed bit to "1". The algorithm first counts the number of empty bitmap entries. It then estimates the column 
cardinality by dividing this count by the bitmap size m and plugging the result. 

2.3 Existing System 

In the existing approach, the bitwise-AND operations were performed on each pair of bitmap values of 
Vector Aj and Bj when first aligned positions were found. So with that approach, there is a chance of performing 
unsuccessful AND operations on 2 vectors. Thus, causes the vain of bitwise-AND operations and leads to 
increase the execution time. The following picture depicts the bitmaps vector A and vector B before performing 
AND operation. 







Table2: Bitmap Vectors of Al and Bl 
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From the Table 2. The first aligned position is found at 4 for vector Al and vector Bl. So the bitwise-AND 
operation takes place from this position and operation will continued to till the end of the two vectors. The 
following picture depicts after bitwise-AND operations the resultant vector will be: 

Table3: Bitwise-AND operation between Bitmap Vectors of Al and Bl 
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From the Table 3. The red shaded cells representing unsuccessful AND operation results. The length 
of vector Al and vector Bl is 17. So, total 9 bitwise- AND operations are performing unnecessarily after first 
aligned position found at 4, even the results are zero. We aimed to overcome this kind of problem with our 
proposed work. 

III. RESEARCH ELABORATION 

The aggregated attributes mentioned in the iceberg query are read from the database table and generate 
equivalent bitmaps of them. Then index positions of l's of bitmaps A m and B m are retrieved. The retrieved 
position values of A m and B m are stored into an array called alndex and blndex respectively. Then Vector A m 
index positions are compared with vector B m index positions. If they are equal, we maintain a counter that will be 
incremented by 1 every time when there is a common position of l's found and then compare the counter value 
with iceberg threshold, if it is above threshold, then confirm this vector pair as an iceberg result and include them 
into iceberg result set. 

Then the resultant index positions of result vector are made it to zero with original vectors A m and B m 
index positions. The updated bitmap vectors of A m and B m to compare bit index positions with another bitmap 
vector in the next iteration. Otherwise, we prune the bitmap A m and B m if index count not passes the threshold. 
Continue the same process until all the vector pairs are completed. 









Example bitmap vectors of Al and Bl: 
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Fig 1. Shows that the retrieving of all 1-bit index positions of Al into alndex array and retrieving all 1-bit index 

positions of Bl into blndex array. 

We explain the proposed work in the following steps: 

Stepl: Retrieve the Index-positions 

Here alndex is an array which stores the index -positions of all 1-bit positions of Al vector. And we 
represent First 1-bit position of Al can be pointed by fp(first position) and last 1-bit position Al can be pointed 
by lp (last position). Similarly we do the same for Bl vector, alndex array consists of all 1-bit position index 
values of Al. And blndex array consists all 1-bit position index values of Bl. 

Step2: Check the Threshold with alndex and blndex 

After relieving the index-positions of vector Al and Bl, we check the alndex length and blndex length 
with the Threshold T. If the length of the alndex array passes the threshold value T (i.e., Length of alndex >= T) 
then the vector Al is eligible for processing, if not we simply ignore the vector Al and we go for other vector of 
Attribute A(i.e. A2, A3, A4 and so on). We do the same operation for vector B. (i.e. Length of blndex >= T) then 
the vector Bl is eligible for processing if not ignore the vector Bl and select the other vector of B for processing. 

Step3: Set the boundaries of fp and lp to compare 2 vectors 

After the vector Al and Bl passes the threshold value then, the process will start now for comparison. 
Here we do compare alndex and blndex arrays and we maintain a counter that counts the occurrences of alndex 
values found in blndex array. 

First we check the fp of alndex with fp of blndex, if alndex.fp > blndex.fp then we perform the 
comparison between alndexFp & blndexFp to alndexLP & blndexLP else we do comparison operation between 
blndexFP & alndexFP to blndexLP & alndexLP. This will fix the boundaries for comparison operations of 
alndex and blndex. 

Step4: Compare the alndex and blndex 

By setting the FP and LP that limits the comparison operation that takes place in between the range only. 
Now, compare alndex array value with blndex array value. In this process of comparison we maintain a counter 
that counts the occurrences. 
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If alndex array value found in blndex array then we increment the counter by 1. The process of 
comparison will continues to till the end of the alndex and blndex .If the occurrence of alndex found in blndex 
then we store the value that is occurred into the reslndex (result Index array) 

Step5: Check the Threshold with reslndex 

After all comparisons of alndex and blndex arrays, we check the length of reslndex array where it stores 
the occurred value., i.e. we check resIndexLength > T ,then we declare the pair Al and Bl as Iceberg result, 
because for first time Al and Bl vectors index values stored inside alndex and blndex. 
For example, consider the user provided threshold value T = 4 then, 
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Figl: Retrieved Index-positions into an arrays alndex and blndex 

Result array count passes the threshold, so declaring Al and Bl pair as iceberg result. 
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The Fig.2 depicts, after Subtract operation of result array with alndex and bindex will be updated: 
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Fig2: Retrieved Index-positions into an arrays alndex and blndex 

We continue the same process for Al with other B m because Al passes the threshold value, and ignore 
Bl vector for next iteration with A m because it does not passes threshold value. 

3.1 Implementation 

The following algorithms will explain that how we implemented the proposed work. Here we used the 
algorithms named 1. IcebergQueryE valuation 2. FindAllOnePositions 3. PerformSubtract 

In this section we explained the algorithms that will be useful for implementing. 

Proposed algorithms to evaluate iceberg query using indexing positions. 

Algorithm 1: IcebergQueryE valuation 
IcebergQuery (Attribute A, Attribute B, threshold T) 
Output: Iceberg Results 
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1 . For each bitmap vector a of Attribute A do // this loop will finds all 1 -bit index positions of vector a&b 

2. alndex[ ], a.countOnes, a.fp,a.lp=findA110nePositions(a) 

3. // fp is the first 1 -bit position, lp is the last 1 -bit position 

4. For each bitmap vector b of attribute B do 

5. blndex[ ],b.countOnes b.fp,b.lp=findAHOnePositions(b) 

6. While a ^ null and b ^ null do // Finding all common index positions of a&b 

7. countres=0; k=0 

8. If ( a.countOnes >= threshold && b.countOnes >= threshold) then 

9. If( a.fp >= b.fp and a.lp>= b.lp) then 

10. For i=a.fp to a.lp step 1 do 

11. If (alndex[i] = blndex[i]) then 

12. Res[k]=i; 

13. Increment k by 1 . //k = k+l. 

14. Increment countRes by 1 // countres=countres+l 

15. End if 

16. Else 

17. For i=b.fp to b.lp step 1 do 

1 8. If(alndex[i] = blndex[i]) then 

19. Res[k]=i; 

20. Increment k by 1 . // k=k+ 1 

21. Increment countres by 1 // countres=countres+l 

22. End if 

23. End if 

24. If ( countres >= threshold ) then //ibq evaluation 

25. Add Iceberg result(a.value, b. value, countres) into R 

26. a,b=performXOR(aIndex,bIndex,Res) 

27. a.countOnes=getCount(aIndex, a.fp,a.lp) 

28. b.countOnes=getCount(bIndex,b.fp,b.lp) 

29. If(a.countOnes > threshold ) then 

30. Repeat from step 9 to 30 for a and b.next 

31. Else 

32. Repeat from step 9 to 30 for a.next and b 

33. Return R 

The above algorithm is efficiently designed to evaluate iceberg query as per proposed research work in 
the previous section of this paper. The algorithm is divided in to three phases. In the first phase i.e. from the line 
numbers from 1 to 4, the algorithm finds all index positions of l's in each bitmap vector and returns into new 

array called alndex[ ] with those index positions and it also computes count of index positions. In the second 
phase i.e. line numbers from 5 to 22, 



Algorithm 2: FindAllOneBitPositions 
findAllOnePositions (attribute X) 
Output: finding fp, lp, returning xlndex [] 



1 . k=0,lp=0,fp=0,countone 

2. For i=0 to x.length step 1 

3. If x[i]=l then 

4. xlndex [k]=i 

5. countone=count+l 

6. lp=i 

7. k=k+l 

8. Next i 

9. fp=xlndex[0] 

10. Return lp,fp,xIndex[],countone 

The above algorithm which finds all 1-bit positions of Vector X. In this process, From line number 2 to 
8, that will scan each bit, if it found a bit 1 then it stores the corresponding index value into and array called 
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xlndex[], and after scanning total records it will returns the last position of 1-bit found and first 1-bit position and 
also returns the array which contains the all index positions of 1 's. 



Algorithm 3: getCountofOnes 

getCount (ylndex [], firstPosition, lastPosition) 

Output: Count the number of positions having l's in the vector. 



1. 


count=0; 


2. 


For i=firstPosition to lastPosition step 1 
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If(yIndex[i]!=0)Then 


4. 
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End if 


6. 


Nexti 


7. 


Return count 



The above algorithm which finds the total number of index positions which were stored in the xlndex[] array. The 
algorithm will repeat the process for finding the l's located in the vector. 

Algorithm 4: performing Subtract operation 
performSubtract (aalndex, bblndex, Res) 
Output: returning updated alndex, bblndex. 
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For i=0 to aalndex.length step 1 
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If (aaIndex[i]=Res[i]) then 
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For i=0 to bblndex.length step 1 
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If (bbIndex[i]=Res[i]) then 
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Nexti 
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Return update aalndex, bblndex 



The above algorithm which updates the alndex array values by comparing the Res[] index array with 
aalndex and bblndex array. If aalndex array value and Res[] array value equals then it changes the aalndex [] 
array value to zero, and same will be applied to bblndex[] array. 

IV. RESULT & DISCUSSION 

Our experiments were carried out with both a synthetic data set and a real patent data set. We generate 
large synthetic data according to five parameters: data size (i.e., number of tuples), attribute value distributions, 
number of distinct values, number of attributes in the table, and attribute lengths. In each experiment, we 
usually focus on the impact of one parameter with respect to data sizes and thus fix the other parameters. 

4.1 Experimental set up 

The experiments are conducted on a following computer system with Core2 Duo. Processor of 2.0 
GHz, 1.0GB main memory running on Microsoft Windows XP ver., and all algorithms are implemented in Java 
platform. The experiments are carried out repeatedly a minimum of 3 to 4 times of same threshold with 
synthetic data set consists of 1 lakh records. The experiments are conducted for high threshold values such as 
from 1000 to 10000. 
4.2 Experimentation 

In our experimentation, we observed that when the threshold value increased, the execution time 
decreased. We also observed that the array indexing technique gives faster results than the existing approaches. 
The following picture shows the experimentation result that was observed from IBQ using bitmap index 
technique and array indexing technique: 
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Fig 3: Execution times in icebergBitmap Index technique, icebergArraylndexTechnique varying 

thresholds. 
4.3 Comparative Analysis 

The existing method refers to the efficient iceberg query evaluation using compressed bitmap index by 
Bin He et al. [1]. The comparison study is based on the responses of retail dataset with two different algorithms. 
Both the algorithms are based on the bitmap index table and the AND operation between the attributes. The aim 
of the algorithms is to reduce the execution time by reducing the unwanted AND operation. The fig. 3 shows the 
comparative analysis of the proposed iceberg evaluation algorithm and the existing iceberg evaluation 
algorithm. Thus we can state that the proposed iceberg evaluation has quick response than the existing approach 
mentioned in [1]. 

V. CONCLUSION 

In the field of information retrieval, the data retrieval from the database is become more time 
consuming process as the number of data is increasing day by day. Specialized queries are used for retrieving 
data from the databases. Iceberg query are similar queries which uses aggregate function and conditional clauses 
to get the data from the databases quickly. The proposed iceberg query evaluation algorithm uses bitmap index 
table for the iceberg evaluation. The algorithm has two important things, one is declaring a vector pair as 
Iceberg Result without performing any AND operation. The second is to improve the efficiency of AND 
operations by Indexing approach for accessing elements faster. In another direction, the execution time can also 
be reduces by eliminating large number of unproductive bitwise-AND operations is would be taken up. Our 
algorithm demonstrates superior performance over existing schemes and it does not depend on any particular 
method. To solve the problem of massive empty and fruitless AND results, we proposed an efficient Indexing 
approach that gives Iceberg results without performing any bitwise-AND operations. 
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